Language processing method and apparatus

ABSTRACT

When a Japanese or Korean bunsetsu-phrase lattice, that is, a set of linguistic units called bunsetsu-phrases each element of which having various starting and ending positions, and numerical values representing the reliability of each bunsetsu-phrase are given, optimum sequences of bunsetsu-phrases as Japanese or Korean clauses or sentences are selected under a criterion of degree of acceptability of a clause or sentence with considerably small amount of computation; the syntactic structures and the degree of acceptability of the optimum sequences are calculated simultaneously; the gist of the invention residing in that processing proceeds from shorter sequences to longer sequences storing the results for shorter sequences and utilizing later the results in calculating for longer sequences thus systematically avoiding the repeated calculation of partially the same quantity.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a language processing method andapparatus applicable to a speech recognizer for recognizing Japanese orKorean speech which is pronounced either continuously or with a pausebetween bunsetsu-phrases. The present invention is also applicable to aJapanese word processor of the type in which a Japanese sentence writtenin Kana (Japanese alphabet) only is converted into a Japanese sentencewritten orthographically using both Kana and Kanji (Chinese character).

In the above language processing method, when a set of bunsetsu-phrasecandidates having various starting and ending positions in the phoneticexpression, that is, a bunsetsu-phrase lattice is given, then, abunsetsu-phrase sequence which is optimum as a Japanese or Korean clauseor sentence is selected from the candidate set taking both thereliability of each bunsetsu-phrase and the degree of dependency betweentwo bunsetsu-phrases into consideration. The optimum dependencystructure of the selected bunsetsu-phrase sequence as a Japanese orKorean clause or sentence, and the degree of acceptability of thedependency structure thus obtained are also calculated.

In the Japanese language and the Korean language, there are a class ofwords called "independent word" such as noun, verb, adjective, adverband so on, and a class of words called "dependent word" such asauxiliary verb and particle. It is noted that a linguistic unit called"bunsetsu-phrase" in this specification is an independent word followedby some (possibly 0) suitable dependent words. An example ofbunsetsu-phrase is a noun followed by a particle indicating a case. Asuitably conjugated form of a verb followed by an auxiliary verb isanother example of bunsetsu-phrase.

2. Description of the Prior Art

A selection of an optimum bunsetsu-phrase sequence from abunsetsu-phrase lattice is an important problem which appears in varioussituations of the Japanese or Korean language processing. Let usconsider, for instance, a Japanese word processor of the type in which aJapanese sentence written as non-segmented Kana string is converted intoa sentence written orthographically using both Kana and Kanji. In orderto obtain a good result of conversion, the following two kinds ofambiguity must be resolved.

(A) Ambiguity in segmenting the given Kana string into a sequence ofsubstrings corresponding to bunsetsu-phrases, and

(B) ambiguity arising from the homonymity of each substrings obtained asa result of the above segmentation.

An ideal way of resolving the above ambiguity will be selecting anoptimum sequence of bunsetsu-phrases, taking both the acceptability ofthe sequence as a Japanese sentence and the reliability of eachbunsetsu-phrase into consideration, from the bunsetsu-phrase latticedecided by all the possible segmentation of the given Kana string and byall the possible bunsetsu-phrases corresponding to each segment.However, only enumeration method has been known to solve theabove-mentioned problem, and since the enumeration method is notpracticable because of its enormous amount of computation, the followingnon-optimal methods have been used.

(1) The two-bunsetsu-longest-coincidence method is a well known methodof segmenting a Kana string into a sequence of bunsetsu-phrases.(Hiroshi Makino and Makoto Kizawa: "Automatic Segmentation forTransformation of Kana into Kanji", Trans. Inf. Proc. Soc. Japan, Vol.20, No. 4, pp.337-345 (1979)). According to this method, the totallength of two adjacent bunsetsu-phrases is used as a measure of goodnessfor segmentation, and the boundary between two segments, which permitthe longest interpretation as two adjacent bunsetsu-phrases, is adoptedas a segmenting point.

(2) Another method is the least BUNSETSU's number method (KenjiYoshimura, Tooru Hitaka and Sho Yoshida: "Morphological Analysis ofNon-Marked-Off Japanese sentence by the least BUNSETSU's number method",Trans. Inf. Proc. Soc. Japan, Vol. 24, No. 1, pp. 40-46 (1983)). In thismethod, a segmentation which yields a sequence of bunsetsu-phraseshaving the least number of bunsetsu-phrases is regarded as the bestsegmentation.

In the above two methods, the strategy to search for a good segmentationis based on a heuristics, and there is no clear reason for thedefinition of optimality employed. Furthermore, they have no function ofresolving the ambiguity coming from the homonymity of each segment, sothat the best they can do is to make a list of candidates ofbunsetsu-phrases in the order of plausibility, leaving the selection ofthe appropriate one to the user just like a conventional Japanese wordprocessor which accepts only a segmented Kana string input.

(3) There is also a method in which a given Kana string is firstsegmented by the above-mentioned two-bunsetsu-longest-coincidencemethod, and then the sequence of segments is parsed from the standpointof dependency between bunsetsu phrases in order to find thesyntactically best bunsetsu-phrase for each segment. (Hiroshi Makino andMakoto Kizawa: "An Automatic Transformation System of Non-Segmented KanaSentences into Kanji-Kana Sentences and its Homonym Analysis", Trans.Inf. Proc. Soc. Japan, Vol. 22, No. 1, pp. 59-67 (1981)).

This method is superior to the above-mentioned methods (1) and (2) inthat the structure of the Japanese language is utilized for selectingthe bunsetsu-phrase sequence from the set of bunsetsu-phrase candidates.However, this method has a problem that because thetwo-bunsetsu-longest-coincidence method itself is a heuristic method,the optimality of the segmentation is not always ensured. Furthermore,the assumption employed for the sake of simplicity in processing thatthe dependency relation holds between the nearest bunsetsu-phrases aslong as it does not violate the rule of dependency is not alwayssatisfied in the actual situation.

(4) There also exists a method which is, in principle, close toselecting the best bunsetsu-phrase sequence from a bunsetsu-phraselattice. (Yoshimitsu Oshima, Masahiro Abe, Katsuhiko Yuura and NobuyukiTakeuchi: "A Disambiguation method in Kana-to-Kanji Conversion UsingCase Frame Grammar", Trans. Inf. Proc. Soc. Japan, Vol. 27, No. 7, pp.679-687 (1986)).

According to this method, however, the selection must be made by theenumeration method, which is computationally impossible. Therefore, thenumber of possible sequences of bunsetsu-phrases must be limited byusing local information prior to global parsing, resulting in a loss ofglobal optimality.

In speech recognition, we also encounter a similar problem of optimumbunsetsu-phrase sequence selection.

One conceivable way of recognizing continuously spoken Japanese orKorean speech will be detecting possible bunsetsu-phrase segments,recognizing them and then listing them as bunsetsu-phrase candidates.Usually, these segments are located at various time positions andoverlapped with each other, and there exists ambiguity resulting fromhomonymity of each segment. Since, in addition to this ambiguity, thereis another kind of ambiguity arising from the uncertainty ofrecognition, the resulting bunsetsu-phrase lattice is more complicatedthan the one in the afore-mentioned word processor case. In order toobtain the final recognition result, a selection of an optimumbunsetsu-phrase sequence from such a complicated bunsetsu-phrase latticeis necessary. However, only the enumeration method or its modificationshave so far been available to solve the problem, therefore, developmentof a more efficient method has been desired.

SUMMARY OF THE INVENTION

Suppose that a string of phonetic symbols is given. If we cut outsubstrings of the string which are recognized as bunsetsu-phrases, andlist all the homonymous bunsetsu-phrases corresponding to eachsubstring, then we obtain a set of bunsetsu-phrases with variousstarting and ending positions in the original string of phoneticsymbols. Such a set of bunsetsu-phrases is called a bunsetsu-phraselattice. A bunsetsu-phrase lattice is also obtained as a form of outputfrom a continuous speech recognizer which spots and recognizesbunsetsu-phrase segments in a speech signal. In terms of bunsetsu-phraselattice, the problem of the present invention may be stated as follows:

"Given a bunsetsu-phrase lattice, select the most appropriate sequenceof bunsetsu-phrases taking both the acceptability as a Japanese orKorean clause or sentence and the reliability of each bunsetsu-phrasefrom all the possible sequences of bunsetsu-phrases satisfying thecondition that the ending position of one bunsetsu-phrase is immediatelyfollowed by the starting position of the succeeding bunsetsu-phrase."

In order to describe the difficulty in solving the above problem andobjects of the present invention, we begin with a rigorous mathematicalformulation of the problem.

Here, for improving the readability of a long mathematical expressionincluding "min", we use

    min(a condition on x)[f(x)]

instead of the conventional

    min f(x) a condition on x.

When there is no fear of confusion, parentheses [and] may be omitted.Similar notation is used for argmin, Σ, ∪ as well, where

    argmin(a condition on x)[f(x)]

denotes the value of x which minimizes f(x) under the condition in theparentheses, Σ denotes sum and ∪ denotes union of sets.

A Japanese or Korean clause or sentence consists of the modificationrelationships in a broad sense between bunsetsu-phrases. For instance,

[S1]

(Tsukueno) (ueni) (aru) (radiowa) (kino)

(kaimashita).

(The radio on the desk was bought yesterday.)

In this case, " (Tsukueno)" (the desk) modifies " (ueni)" (on); "(ueni)" modifies " (aru)" (exists or placed); " (aru)" modifies "(radiowa)" (radio): " (radiowa)" modifies "

(kaimashita)" (bought); and " (kino)" (yesterday) modifies "

(kaimashita)". Thus a Japanese sentence is constructed. Furthermore, inthe case of a Korean sentence, in [S2]

for instance, " " (the desk) modifies " " (on); " " modifies " " (existsor placed); " " modifies " " (radio); " " modifies " " (bought); and " "(yesterday) modifies " ", respectively. In this manner, a Koreansentence is constructed.

When a bunsetsu-phrase x modifies a bunsetsu-phrase y, x is said todepend on y and y is said to receive x. Such amodifier-and-modified-phrase relationship is called "dependency" in thisspecification.

In order that a sequence of bunsetsu-phrases compose a Japanese orKorean clause or sentence, the following conditions are necessary:

[C1] A bunsetsu-phrase, except for the last one, must depend on one andonly one of the succeeding bunsetsu-phrases.

[C2] A dependency between two bunsetsu-phrases does not cross withanother dependency between other two bunsetsu-phrases.

The conditions [C1] and [C2] can be expressed by the "dependencystructure" to be defined below:

[D1] (1) If x is a bunsetsu-phrase, then <x> is a dependency structure.

(2) If X₁, X₂, . . . , Xm are dependency structures and if x is abunsetsu-phrase, then <X₁ X₂ . . . X_(m) x> is a dependency structure.When X₁ =< . . . x₁ >, X₂ =< . . . x₂ >, . . . , X_(m) =< . . . x_(m) >,are dependency structures, <X₁ X₂ . . . X_(m) x> signifies adependency-structure in which x₁, x₂, . . . , x_(m) depend on x.

[D2 ]

When a sequence of bunsetsu-phrase x₁ x₂ . . . x_(n) is suitably markedwith parentheses so that the sequence becomes a dependency structure, itis called a dependency structure on x₁ x₂ . . . x_(n).

The set of all the dependency structures on x₁ x₂ . . . x_(n) is denotedby

K(x₁ x₂ . . . x_(n)).

Note that a bunsetsu-phrase sequence can have many dependency structureson it. For instance,

<<Tsukueno><ueni><aru><radiowa><kino>kaimashita>

<<<Tsukueno>ueni<>>aru<radiowa><kino>kaimashita>

<<<<<<Tsukueno>ueni>aru>radiowa>kino>kaimashita>

<<<<<Tsukueno>ueni>aru>radiowa><kino>kaimashita>are some of thedependency structures on

Tsukueno ueni aru radiowa kino kaimashita. It does not necessarilyfollow, however, that all of the above-mentioned dependency structuresrepresent appropriate Japanese sentences. Therefore, a degree ofacceptability of a dependency structure is defined in the followingmanner.

First, a degree of dependency that a bunsetsu-phrase x depends on abusetsu-phrase y is supposed to be expressed by the following functiontaking a non-negative value:

    PEN(x,y).

In the conventional Japanese grammer, it has been considered that abunsetsu-phrase x can or cannot depend on a bunsetsu-phrase y; that is,PEN (x,y) can taken only the value 0 or 1, but in the present invention,it is permitted that PEN(x,y) can take a continuous value. Furthermore,it is defined that the closer the value of PEN(x,y) to 0, the higher thedegree of dependency. It is a very important problem how to determinethe function PEN, but since this problem does not constitute the presentinvention, it will not be described in this specification.

A degree of acceptability P(X) of a dependency structure X isrecursively defined as follows by utilizing PEN:

[D3]

(1) If

    X=<x> (where x is a bunsetsu-phrase), then P(X)=0,

and

(2) If

    X=<X.sub.1 X.sub.2 . . . X.sub.m x>, X.sub.1 =< . . . x.sub.1 >, X.sub.2 =< . . . x.sub.2 >, . . . , X.sub.m =< . . . x.sub.m >,

then

    P(X)=P(X.sub.1)+P(X.sub.2)+ . . . +P(X.sub.m)+PEN(x.sub.1,x)+PEN(x.sub.2,x)+ . . . +PEN(x.sub.m,x)

The value of P(X) thus defined is the sum of the values of PEN for allof the dependencies in X.

When X and Y are dependency structures such that Y=<Y₁ Y₂ . . . Y_(m)y>, (where Y₁, . . . , Y_(m) represent dependency structures and yrepresents a bunsetsu-phrase), a dependency structure obtained byinserting X between the left parenthesis and the top dependencystructure of Y

    <XY.sub.1 Y.sub.2 . . . Y.sub.m y>

is denoted by X⊕Y. Then, the following two propositions hold.

[E1]

For dependency structures X=< . . . x> and Y=< . . . y>,

    P(X⊕Y)=P(X)+P(Y)+PEN(x,y).

[E2]

For an arbitrary bunsetsu-phrase sequence x₁ x₂ . . . x_(n),

(1) when n=1

    K(x.sub.1)=<x.sub.1 >, ##EQU1##

[D4]

For sets of bunsetsu-phrases A₁, A₂, . . . , A_(m), we define

    KB(A.sub.1, A.sub.2, . . . , A.sub.m)={X|XεK(x.sub.1 x.sub.2 . . . x.sub.m), x.sub.1 εA.sub.1, x.sub.2 εA.sub.2, . . . , x.sub.m εA.sub.m }

KB(A₁, A₂, . . . , A_(m)) represents all the dependency structures onall the bunsetsu-phrase sequences composed by selecting abunsetsu-phrase from each of A₁, A₂, . . . , A_(m), and concatenatingthem.

[D5]

(1) The m-segmentation of an integer number sequence i, i+1, . . . , jis an integer set (s₀, s₁, s₂, . . . , s_(m)) which satisfy

    i-1=s.sub.0 <s.sub.1 <s.sub.2 < . . . <s.sub.m =j

(2) The set of all the m-segmentations of an integer sequence i, i+1, .. . , j is denoted by

    D.sub.m (i,j)={(s.sub.0,s.sub.1,s.sub.2, . . . , s.sub.m)|i-1=s.sub.0 <s.sub.1 <s.sub.2 < . . . <s.sub.m =j}

(3) The set D(i,j) of the segmentations of an integer sequence, i, i+1,. . . , j is defined as follows:

    D(i,j)=∪(1≦m≦j-i+1)[D.sub.m (i,j)]

Now, the following situation will be considered.

[J1] The positions of phonetic symbols are represented by naturalnumbers 1 through N. A set of bunsetsu-phrases B(i, j) is given for eachi, j(1≦i≦j≦N), in which each bunsetsu-phrase has the starting position iand the ending position j. Furthermore, a non-negative real number S(x)is attached to each bunsetsu-phrase x.

Bunsetsu-phrases xεB(i,j) and yεB(i',j') (i≠i' or j≠j') are treated asdifferent objects even if they coincide as a bunsetsu-phrase.

The set of said B(i,j)(1≦i≦j≦N) is called a bunsetsu-phrase lattice.Consider the case of a Japanese word processor in which a non-segmentedJapanese sentence written in Kana is converted into a Japanese sentencewritten in Kana and Kanji. When a Kana string a₁, a₂, . . . a_(N) isgiven, B(i,j) represents the set of all the bunsetsu-phrases having thesubstring a_(i) a_(i+1) . . . a_(j) as a Kana expression. The value ofS(x) is determined by such information as the word frequency. A smallervalue of S(x) represents a higher degree of reliability of x. In thecase of continuous speech recognition, B(i,j) represents the set ofbunsetsu-phrases output as candidates of recognition results for thesegment having the starting and ending positions i and j, respectively.In this case, S(x) represents the degree of reliability of therecognition result x determined by the speech recognizer. Most speechrecognizers are so constructed as to output such a value together withthe recognition result. In either cases, there exists a possibility thatthere is no bunsetsu-phrase having the starting and ending positions iand j, respectively, so that B(i,j) may become an empty set. In thelatter case, in order to avoid an exceptional treatment, when B(i,j) isan empty set, a dummy bunsetsu-phrase is added to B(i,j) and the valueof S for the dummy bunsetsu-phrase is defined as infinity. Furthermore,when at least one of x and y is a dummy bunsetsu-phrase, the value ofPEN(x,y) is also defined as infinity.

Moreover, S is extended so that it may be applied to a dependencystructure. That is, when XεK(x_(i) x_(i+1) . . . x_(j)), S(X) is definedas follows:

    S(X)=Σ(i≦m≦j)[S(x.sub.m)]

Under these conditions, the problem of the present invention may beexplained as follows:

When a segmentation of the phonetic symbol positions 1, 2, . . . , N

    (s.sub.0, s.sub.1, s.sub.2, . . . , s.sub.m)εD(1,N)

is selected, a sequence of sets of bunsetsu-phrases

    B(s.sub.0 +1, s.sub.1),B(s.sub.1 +1,s.sub.2), . . . B(s.sub.m-1 +1, s.sub.m),

is determined corresponding to said selected segmentation. Furthermore,if a bunsetsu-phrase x_(k) is selected from each set of bunsetsu-phrasesB(S_(k-1) +1, s_(k)), the set of dependency structures

    K(x.sub.1 x.sub.2 . . . x.sub.m)

is determined.

Furthermore, when a dependency structure X is selected from K(x₁ x₂ . .. x_(m)), the sum of the degree of acceptability and the degree ofreliability is determined as follows:

    P(X)+S(X)

Therefore, it is necessary to select the segmentation, thebunsetsu-phrase sequence and the dependency structure which minimize thevalue of P(X)+S(X) from above-mentioned all the possible segmentations,bunsetsu-phrase sequences and dependency structures. That is, theproblem is to obtain the following minimum value and the values ofvariables which attain the minimization: ##EQU2##

In order to select the optimum bunsetsu-phrase sequence, it is necessaryto take into consideration whether a bunsetsu-phrase sequence isacceptable or not as a Japanese or Korean clause or sentence. As aresult, the problem becomes one of obtaining the optimum dependencystructure as well as the optimum bunsetsu-phrase sequence as describedabove. On the other hand, if the optimum dependency structure isobtained, the bunsetsu-phrase sequence composing the structure isdetermined. So we rewrite the above-mentioned problem into the problem[P1] below in which the dependency structure is the main variable byusing the fact: ##EQU3##

[P1] Find the values of (1) and (2): ##EQU4##

Up to the present, the enumeration has been the only available method tosolve this problem; we had to calculate P(X)+S(X) for all the dependencystructures X in the set ##EQU5## in order to find the minimum value andthe dependency structure which attains this minimum. When the number ofbunsetsu-phrases in each B(i,j) is M, and when the whole length of thestring of phonetic symbols is N, the number of elements of the above setis given by

    [G1]Σ(1≦n≦N).sub.N-1 C.sub.n-1 ·M.sup.j ·knum(n)

where _(N-1) C_(n-1) is a binomial coefficient, and knum(n) representsthe number of dependency structures on a bunsetsu-phrase sequence withlength n, which can be calculated as follows: ##EQU6##

The values of [G1] for several M and N are shown in TABLE 1.

                  TABLE 1                                                         ______________________________________                                                   M                                                                  N            5         10                                                     ______________________________________                                         5           5.8 × 10.sup.4                                                                    1.6 × 10.sup.6                                   10           8.0 × 10.sup.10                                                                   6.3 × 10.sup.13                                  15           1.7 × 10.sup.17                                                                   3.9 × 10.sup.21                                  20           4.6 × 10.sup.23                                                                   2.9 × 10.sup.29                                  ______________________________________                                         The numbers of elements in the set ∪ ((S.sub.0,S.sub.1, . . .          ,S.sub.m) ε D(1,N))[KB(B(S.sub.0 + 1,S.sub.1), B(S.sub.1 +            1,S.sub.2), . . . ,B(S.sub.m-1 + 1,S.sub.m))].                           

Here, the number of elements in B(i,j) is supposed to be the same forall i and j(1≦i≦j≦N), and is represented by M.

It is seen that in the case of the enumeration method, the number ofarithmetic operations grows rapidly as N increases, and becomes anextremely large number even for moderate value of N so that it isextremely difficult to apply the enumeration method to theabove-mentioned problem in practice.

It is an object of the present invention to overcome the defectsencountered in the prior art methods.

More specifically, it is an object of the present invention to provide alanguage processing method which is remarkably efficient as comparedwith the prior art methods for the reason that the number of arithmeticoperations is of the order of a polynomial with respect to the length ofthe phonetic symbol string and the number of elements of eachbunsetsu-phrase set.

It is another object of the present invention to provide a languageprocessing apparatus which performs the language processing method ofthe present invention in an efficient manner to overcome the defects ofthe prior art.

It is apparent that the present invention can be applied not only to theJapanese or Korean language but also to any languages which havegrammatical structure based on the dependency relation between words orphrases.

1. Theoretical Foundations of the Invention

1.1. Fundamental Recurrence Equations

1.1.1. The Case of General Bunsetsu-Phrase Lattice

Prior to the description of the present invention, recurrence equationswhich play fundamental rolls in the present invention will be described.First, the following definitions are made:

[D6] For a fixed natural number N, and for i, j, x satisfying 1≦i≦m≦j≦N,xεB(m,j), ##EQU7## When m=i, D(i,m-1) is not defined. In this case, thefollowing convention is adopted: ##EQU8## In case of (2), there existmany dependency structures X which minimize P(X)+S(X) in general, sothat OPTKS(i,j,m,x) becomes a set.

In the present invention, the following two recurrence equations [T1]and [T2] for OPTPS and OPTKS, respectively, play the fundamental roles.

    [T1]

    For 1≦i≦m≦j≦N, xεB(m,j),

    (1) if i=m,

then

    OPTPS(i,j,m,x)=S(x),

    and

    (2) if i<m,

then

    OPTPS(i,j,m,x)

    =min(i≦n≦k≦m-1)[min(yεB(n,k))

    [OPTPS(i,k,n,y)+PEN(y,x)]+OPTPS(k+1,j,m,x)].

[T2]

    For 1≦i≦m≦j≦N, xεB(m,j),

    (1) if i=m,

then

    OPTKS (i,j,m,x)={<x>},

and

    (2) if i<m,

then

    OPTKS(i,j,m,x)

    =∪((n,k,y)εKTS(i,j,m,x))

    [{X⊕Y|XεOPTKS(i,k,n,y),YεOPTKS(k+1,j,m,x)}],

where KTS (i,j,m,x) is the set of all the triplets (n,k,y) of n, k, ywhich attain the minimum in (2) of [T1].

Next, these recurrence equations are proved. To this end, the following[E3] is shown first: ##EQU9## (proof)

    Let Zε∪((s.sub.0,s.sub.1,s.sub.2, . . . , s.sub.p)εD(i,m-1))

    [KB(B(s.sub.0 +1,s.sub.1),B(s.sub.1 +1,s.sub.2), . . . ,B(s.sub.p-1 +1,s.sub.p),{x})],

    then there exist (s.sub.0,s.sub.1, s.sub.2, . . . , s.sub.p)εD(i,m-1)

    and x.sub.1 εB(s.sub.0 +1, s.sub.1), x.sub.2 εB(s.sub.1 +1,s.sub.2), . . . , xpεB(s.sub.p),

such that

    ZεK(x.sub.1 x.sub.2 . . . x.sub.p x).

According to [E2], there exist t satisfying 1≦t≦p, and Y, X satisfying

    YεK(x.sub.1 x.sub.2 . . . x.sub.t), XεK(x.sub.t+1 x.sub.t+2 . . . x.sub.p x)

such that

    Z=Y⊕X.

    Let n=s.sub.t-1 +1, k=s.sub.t, y=x.sub.t,

then ##EQU10## This shows that the left side of the equation to beproved is included in the right side. The fact that the right side isincluded in the left side can be shown in a manner substantially similarto that described above.

Next, T1 is proved by using E3. (proof of T1) ##EQU11##

Thus [T1] is proved. (proof of T2) Since (1) is obvious from itsdefinition, only (2) will be shown.

First, it should be noted that for X satisfying

    Xε∪((s.sub.0,s.sub.1 s.sub.2, . . . s.sub.p)εD(i,m-1))

    [KB(B(s.sub.0 +1,s.sub.1),B(s.sub.1 +1,s.sub.2), . . . ,B(s.sub.p-1 +1,s.sub.p),{x})],

    XεOPTKS(i,j,m,x) is equivalent with

    P(X)+S(X)=OPTPS(i,j,m,x).

In order to prove (2), it suffices to show

    (a) OPTKS(i,j,m,x)

    ε∪((n,k,y)εKTS(i,j,m,x))

    [{Y⊕X|YεOPTKS(i,k,n,y), XεOPTKS(k+l,j,m,x)}]

    and

    (b) OPTKS(i,j,m,x)

    ε∪((n,k,y)εKTS(i,j,m,x))

    [{Y⊕X|YεOPTKS(i,k,n,y),XεOPTKS(k+l,j,m,x)}].

Now (a) can be shown as follows:

Let

    Zε∪((n,k,y)εKTS(i,j,m,x))

    [{Y⊕X|YεOPTKS(i,k,n,y),XεOPTKS(k+l,j,m,x)}],

then there exist

    (n, k, y)εKTS(i,j,m,x),

    YεOPTKS(i,k, n, y),

    XεOPTKS(k+l,j,m,x)

    such that Z=Y⊕X.

Obviously, ##EQU12## Therefore

    ZεOPTKS(i,j,m,x).

In order to show (b), let

    ZεOPTKS(i,j,m,x).

Since ##EQU13## according to [E3], there exist n, k which satisfyi≦n≦k≦m-1, and y, Y, X satisfying yεB(n,k), ##EQU14## such that

    Z=Y⊕X

On the other hand,

    P(Z)+S(Z)=P(Y)+P(X)+S(Y)+S(X)+PEN(y,z),

and since Z minimizes the left side of this equation, Y and X must alsominimize the right side of the same equation. Therefore, as is apparentfrom the proof of [T1]

    (n, k, y)εKTS(i,j,m,x)

and

    YεOPTKS(i,k, n, y), XεOPTKS(k+1,j,m,x)}]

Hence, ##EQU15## Thus, [T2] is proved.

OPTPS(i,j,m,x) has four variables i, j, m and x. Since m represents thestarting position of the bunsetsu-phrase x, m is uniquely determineddepending on x. Therefore, if we define

[D7]

    (1) IN(x)=(the starting position of bunsetsu-phrase

    x),

    (2) B(i,j)=∪(i≦k≦j)B(k,j),

and

    (3) for xεB(i,j)

OPT(i,j,x)=OPTPS(i,j,IN(x),x), then, [T1] may be rewritten as follows:

[T1']

    For 1≦i≦j≦N and xεB(i,j),

    (1) if i=IN(x),

then

    OPT(i,j,x)=S(x)

and

    (2) if i<IN(x),

then

    OPT(i,j,x)=min(i≦k≦IN(x)-1)

    [min(yεB(i,k))[OPT(i,k,y)+PEN(y,x)]

    +OPT(k+1,j,k)]

As to OPTKS, we define similarly

[D8]

    For 1≦i≦j≦N and xεB(i,j),

    OPTK(i,j,x)=OPTKS(i,j,IN(x),x).

Furthermore, we denote the set of pairs (k,y) of k and y which attainthe minimum value in (2) of [T1'] by KT(i,j,x), Then, [T2] may berewritten as follows:

[T2']

    For 1≦i≦j≦N, xεB(i,j)

    (1) if i=IN(x),

then

    OPTK(i,j,x)=}<x>}

and

    (2) if i<IN(x),

then

    OPTK(i,j,x)

    =∪((k,y)εKT(i,j,x))

    [{Y⊕X|YεOPTK(i,k,y), XεOPTK(k+1,j,x))}].

When (k,y) is an element of KT(i,j,x), k is called an optimumsegmentation point for i, j and x, while y is called an optimumbunsetsu-phrase corresponding to k.

1.1.2. The Case in Which the Bunsetsu-Phrase Lattice Becomes a Sequenceof Bunsetsu-Phrase Sets.

As is apparent from the definitions of OPT(i,j,x) and OPTK(i,j,x), thevalue of the first variable is significant only if it is equal to thestarting position of a bunsetsu-phrase which is not a dummy and is equalto the number obtained by adding 1 to the ending position of abunsetsu-phrase which is not a dummy, or if it is equal to 1. We cantherefore renumber, beginning with 1, the value which can be taken bythe first variable, after removing the irrelevant values. In the sameway, the value of the second variable is significant only if it is equalto the ending position of bunsetsu-phrase which is not a dummy and isequal to the number obtained by subtracting 1 from the starting positionof a bunsetsu-phrase which is not a dummy, or if it is equal to N. Wecan therefore renumber, beginning with 1, the value which can be takenby the second variable after removing the irrelevant values.

An example in which the above-mentioned renumbering of the phoneticsymbol positions is very effective will be described below. Assume thatrelative to the phonetic symbol positions 1 through N, a segmentation0=s₀ <s₁ < . . . <Sn=N exists such that every bunsetsu-phrase in thebunsetsu-phrase lattice starts from s_(i-1) +1 and ends at s_(i) forsome i (1≦i≦n). That is, in this case, the bunsetsu-phrase latticeconsists of a sequence of bunsetsu-phrase sets B(s₀ +1,1,s₁), B(s₁+1,s₂), . . . , B(s_(n-1) +1,s_(n)). The output from a word processor ora speech recognizer which accepts bunsetsu-phrases separated with spacesor pauses takes this form. It this case, for the value of the firstvariable of OPT(i,j,x) and OPTK(i,j,x), only s₀ +1, s₁ +1, . . . ,s_(n-1) +1 are significant, and for the value of the second variable,only s₁, s₂, . . . , s_(n) are significant. When n is replaced by N andwhen these significant values are renumbered from 1 to N, the first andsecond variables of OPT(i,j,x) and OPTK(i,j,x) have the values rangingfrom 1 to N. Furthermore, these numbers indicate the order ofbunsetsu-phrase sets.

Let

    B.sub.i =B(s.sub.i-1 +1,s.sub.i), (1≦i≦N),

then [T1'] and [T2'] may be rewritten as follows as is apparent from theabove description:

[T1"]

    For 1≦i≦j≦N and xεB.sub.j

    (1) if i=j,

then

    OPT(i,j,x)=S(x),

and

    (2) if i<j,

then

    OPT(i,j,x)

    =min (i≦k≦j-1)[min(yεB.sub.k)[OPT(i,k,y)

    +PEN(y,x)]+OPT(k+1,j,x)].

[T2"]

    For 1≦i≦j≦N, xεB.sub.j

    (1) if i=j,

then

    OPTK(i,j,k)={<x>},

and

    (2)if i<j,

then

    OPTK(i,j,k)

    =∪((k,y)εKT(i,j,x))[{Y⊕X|YεOPTK(i,k,y),

    X∪OPTK(k+1,j,x)}],

where KT(i,j,x) represents the set of the pairs (k,y) of k and y whichattain the minimum value in (2) of [T1"].

1.2. Methods for Determining the Value of OPT and the Pair of OptimumSegmenting Point and Optimum Bunsetsu-Phrase:

1.2.1. The Case of General Bunsetsu-Phrase Lattice

The part (1) of [T1'] shows the fact that when i=IN(x), the value ofOPT(i,j,x) is determined to be the value of S(x), while (2) of [T1']shows the fact that when i<IN(x), if OPT(i,k,y) and OPT(k+1, j,x)(i≦k≦IN(x)-1, yεB(i,k) have been already calculated, the value of OPT(i,j,x) can be obtained by solving a minimization problem with onevariable twice. By these facts, the calculation of OPT(i,j,x) (1≦i≦N,xεB(i,j)) can proceed from intervals with i=j to larger intervalsincluding the previous intervals. In the process of determining thevalue of OPT (i,j,x), the pairs of optimum segmenting point and optimumbunsetsu-phrase are also determined. When the value of OPT (1,N,x) iscalculated for each xεB(1,N), this phase of calculation is completed.

1.2.2. The Case of a Sequence of Bunsetsu-Phrase Sets

The part (1) of [T1"] shows the fact that when i=j, OPT(i,j,x) isdetermined to be the value of S(x), while (2) of [T1"] shows the factthat when i<j, if OPT(i,k,y) and OPT(k+1,j,x) (i≦k≦j-1, yεB_(k)) havebeen already calculated, OPT(i,j,x) can be obtained by solving aminimization problem with one variable twice. By these facts, thecalculation of OPT(i,j,x)(1≦i≦j≦N, xεB_(j)) can proceed from intervalswith i=j to larger intervals including the previous intervals. In theprocess of determining the value of OPT (i,j,x), the pairs of optimumsegmenting point and optimum bunsetsu-phrase are also determined. Whenthe value of OPT (i,N,x) is calculated for each xεB_(N), this phase ofcalculation is completed.

1.3. Methods for Determining the Optimum Dependency Structure and theDegree of Acceptability of This Structure

For the sake of simplicity in explanation, the case in which the pair ofthe optimum segmenting point and the optimum bunsetsu-phrase is uniquelydetermined will be described. In this case, OPTK(i,j,x) is equal to onlyone dependency structure.

1.3.1. The Case of General Bunsetsu-Phrase Lattice

Since ##EQU16## by calculating the right side of this equation, thedegree of acceptability of the optimum dependency structure on theoptimum bunsetsu-phrase sequence can be obtained. Furthermore, let

    x.sub.0 =argmin(xεB(1,N))[OPT(1,N,x)]

then the optimum dependency structure on the optimum bunsetsu-phrasesequence is given by

    OPTK(1,N,x.sub.0).

The determination of the structure of OPTK (1,N,x₀) is carried out asfollows. If IN(x₀)=1, then according to (1) of [T2'],

    OPTK(1,N,x.sub.0)=<x.sub.0 >

so the optimum dependency structure is determined. ##EQU17## where k₁ isthe optimum segmenting point and x₁ is the optimum bunsetsu-phrase for1, N, x₀, If IN(x₁)≠1, by using the optimum segmenting point k₂ and theoptimum bunsetsu-phrase for 1,k₁,x₁, OPTK(1,k₁,x₁) can be furtherdecompsed as follows:

    OPTK(1,k.sub.1,x.sub.1)

    =OPTK(1,k.sub.2,x.sub.2)⊕OPTK(k.sub.2 +1,k.sub.1,x.sub.1)

If IN(x₀)≠k₁ +1, by using the optimum segmenting point k₃ and theoptimum bunsetsu-phrase x₃ for k₁ +1,N,x₀, OPTK(k₁ +1,k₂,x₃) can be alsodecomposed as follows: ##EQU18## Such decomposition operations arecarried out until IN(x)=i holds for all OPTK(i,j,x) which appear in theprocess. The OPTK(i,j,x) in which IN(x)=i holds is replaced by thedependency structure consisting of only one bunsetsu-phrase according to(1) of [T2'], and then the insertion operations are carried out in thereverse order of the decomposition operations. Thus, the optimumbunsetsu-phrase sequence and the optimum dependency structure thereoncan be obtained simultaneously.

When there are more than one pairs of the optimum segmenting point andthe optimum bunsetsu-phrase, the same operations are carried out for allsuch pairs, and OPTK(1,N,x₀) consists of all the dependency structuresthus obtained.

1.3.2. The Case of a Sequence of Bunsetsu-Phrase Sets.

First, as in the case of the general bunsetsu-phrase lattice, bycalculating

    min(xεB.sub.N)[OPT(1,N,x)]

the degree of acceptability for the optimum dependency structure on theoptimum bunsetsu-phrase sequence is calculated. Furthermore, let

    x.sub.0 =argmin(xεB.sub.N)[OPT(1,N,x)]

then the optimum dependency structure on the optimum bunsetsu-phrasesequence is given by

    OPTK(1,N,x.sub.0)

The determination of the structure of OPTK(1,N,x₀) is carried out asfollows:

If N=1, then according to (1) of [T2"],

    OPTK(1,N,x.sub.0)=<x.sub.0 >

so the optimum dependency structure is determined.

If N≠1, then according to (2) of [T2"],

    OPTK(1,N,x.sub.0)

    =OPTK(1,k.sub.1,x.sub.1)⊕OPTK(k.sub.1 +1,N,x.sub.0)

where k₁ is the optimum segmenting point and x₁ is the optimumbunsetsu-phrase for 1, N, x₀. If k₁ ≠1, by using the optimum segmentingpoint k₂ and the optimum bunsetsu-phrase x₂ for 1,k₁,x₁, OPTK(1,k₁,x₁)can be decomposed as follows:

    OPTK(1,k.sub.1,x.sub.1)

    =OPTK(1,k.sub.2,k.sub.2)⊕OPTK(k.sub.2 +1,k.sub.1,x.sub.1)

In a similar manner, if N≠k₁ +1, by using the optimum segmenting pointk₃ and the optimum bunsetsu-phrase x₃ for k₁ +1,N,x₀,OPTK(k₁ +1,N,x₀)can be decomposed as follows: ##EQU19## Such decomposition operationsare carried out until i=j holds for all OPTK(i,j,x,) which appear in theprocess. The OPTK(i,j,x) in which i=j, by using (1) of [T2"], isreplaced by a dependency structure consisting of only onebunsetsu-phrase according to (1) of [T2"], and then insertion operationsare carried out in the reverse order of the decomposition operations.Thus, the optimum bunsetsu-phrase sequence and the optimum dependencystructure thereon can be obtained simultaneously.

When there are more than one pairs of the optimum segmentation point andthe optimum bunsetsu-phrase, the same operations are carried out for allsuch pairs, and OPTK(1,N,x₀) consists of all the dependency structuresthus obtained as in the case of the general bunsetsu-phrase lattice.

The above and other objects, effects, features and advantages of thepresent invention will become more apparent from the followingdescription of preferred embodiments thereof taken in conjunction withthe accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram showing the first embodiment of an apparatusfor carrying out the present invention;

FIGS. 2A and 2B show a flowchart representing an example of the controlsequence in the first embodiment;

FIGS. 3A and 3B show an example of the construction of tables requiredfor carrying out the control sequence defined in the flowchart shown inFIGS. 2A and 2B or 5A and 5B;

FIG. 4 is a block diagram showing the second embodiment of an apparatusfor carrying out the presence invention; and

FIGS. 5A and 5B show a flowchart showing an example of the controlsequence in the second embodiment.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS Apparatus and Sequences toCarry out the Invention

2.1 The Case of General Bunsetsu-Phrase Lattice

The first embodiment of an apparatus to carry out the present inventionbased on item 1.2.1. is shown in FIG. 1.

In the following description, the phonetic symbol positions arerepresented by 1, 2, . . . , N; the number of elements of thebunsetsu-phrase set B(i,j) is represented by NUM(i,j); and the elementsof B(i,j) are represented by x_(i),j,1, x_(i),j,2, . . . , x_(i),j,NUM(i,j).

In FIG. 1, SC designates a buffer memory such as a RAM for storing thevalue S(x_(i),j,q), i.e. the degree of reliability of eachbunsetsu-phrase, transferred through an input terminal i₁ ; and BUF, abuffer memory such as a RAM for holding therein bunsetsu-phrase setstransferred through a bunsetsu-phrase input terminal i₂. When thepresent invention is applied to speech recognition, for instance, eachbunsetsu-phrase of the bunsetsu-phrase lattice coming out from thespeech recognizer is applied to the bunsetsu-phrase input terminal i₂,while the degree of realibility of each bunsetsu-phrase coming out fromthe speech recognizer is applied to the input terminal i₁. When thepresent invention is applied to Japanese word processing of the type inwhich a non-segmented Kana sentence is converted into a sentenceconsisting of Kana and Kanji, a morphological analysis of a given Kanastring a₁, a₂, . . . , a_(N) is carried out by any suitable prior artmethod and bunsetsu-phrase candidates for each i,j(1≦i≦j≦N) each havinga substring a_(i), a_(i+1), . . . , a_(j) of the original Kana stringare all enumerated and applied to the input terminal i₂. In this case,the degree of reliability of each bunsetsu-phrase which is determined bysuch information as the frequency of use of a word is applied to theinput terminal i₁.

PE is a unit for calculating the degree of dependency between twobunsetsu-phrases x and y read out from BUF.

T1 and T2 are RAMS for realizing tables TABLE1 and TABLE2 in theflowchart as shown in FIGS. 2A and 2B, respectively.

INIT is a detector unit for detecting whether the starting position ofbunsetsu-phrase x_(i),j,q is equal to i or not; that is, whether or notIN(x_(i),j,q)=i.

SEL is a data selected unit for selecting S(x_(i),j,q) from SC inresponse to the reception of a signal from INIT representing that thestarting position of a bunsetsu-phrase x_(i),j,q is equal to i, andwriting the value into TABLE1(i,j,q) realized as T1.

ADD1 is an adder for adding a value stored in the TABLE1(i,k,p) to thevalue of PEN(x_(i),k,p,x_(i),j,q).

MIN1 is a minimum value detector for detecting a minimum value of theoutput from the adder ADD1 when said p is varied, and for detecting pwhich gives said minimum value.

ADD2 is an adder for adding the output from the minimum value detectorMIN1 to the value in TABLE1(k+1,j,q).

MIN2 is a minimum value detector for detecting a minimum value of theoutput from the adder ADD2 when said k is varied, and for detecing kwhich gives said minimum value.

CONT designates a control unit for controlling the whole apparatus towork in the predetermined operation sequence, and, for instance,comprises a central processing unit CPU, a memory MEM1 in the form of aROM for storing therein the control sequence and a working memory MEM2in the form of a RAM. The results of the calculation written into RAMsT1 and T2 are read out from output terminals O₁ and O₂, respectively.

FIGS. 2A and 2B show a flowchart illustrating an example of controlsequence stored in MEM1 beforehand in the first embodiment shown in FIG.1 for obtaining the pair of the optimum segmenting point and the optimumbunsetsu-phrase used to decide the optimum dependency structure on theoptimum bunsetsu-phrase sequence and its degree of acceptability. Firstthis flowchart will be described.

In addition to the flowchart as shown in FIGS. 2A and 2B, twothree-dimensional tables TABLE1(i,j,q) and TABLE2(i,j,q) having, asshown in FIGS. 3A and 3B, the same number of columns and rows as thetotal number N of the phonetic symbol positions under consideration andthe same number of sections as the number NUM(i,j) of elements ofbunsetsu-phrase sets B(i,j) (1≦i≦j≦N,1≦q≦NUM(i,j)) are required. Thesuffixes of each table represents the positions of corresponding row,column and section from the left to right, respectively. TABLE1(i,j,q)is used to store the values of OPT(i,j,x_(i),j,q) while TABLE2(i,j,q) isused to store the pairs of the optimum segmenting point and the optimumbunsetsu-phrase number for i,j,x_(i),j,q.

Operations in the flowchart proceed as follows:

○1 In steps S1-S13 of the flowchart as shown in FIGS. 2A and 2B, thecolumn number j of each table is incremented from 1 to N and theoperation ○2 to be described below is carried out for each column.

○2 In steps S2-S11, the row number i is decremented from j to 1 and thenthe following operation ○3 is executed.

○3 In steps S3 to S9, q is incremented from 1 to NUM(i,j) and thefollowing operations (1) and (2) are executed.

(1) If IN(x_(i),j,q)=1 in step S4, then the following operation [F1] isexecuted in step S7:

[F1] The value of S(x_(i),j,q) is stored in TABLE1(i,j,q).

(2) If IN(x_(i),j,q)>i in step S4, [F2] and [F3] below are executed instep S5 and step S6, respectively. [F2] ##EQU20## is calculated and thevalue is stored in TABLE1(i,j,q).

[F3] The pair (k,p) of k and p which attain the minimum value in [F2] isstored in TABLE2(i,j,q).

According to the above-mentioned procedure, each row, each column andeach section in TABLE1 and TABLE2 are filled with the calculated valuessequentially.

When j>N in step S13, the calculation is completed, and the values ofOPT(1,N,x₁,N,q) (1≦p≦NUM(1,N)) are stored in TABLE1(1,N,q). Since theinformation concerning the optimum segmenting points and optimumbunsetsu-phrase numbers is stored in TABLE2, the optimum bunsetsu-phrasesequence and the optimum dependency structure thereon can be composedusing the above information in the manner described in item 1.3.1.

2.2. The Case of a Sequence of Bunsetsu-Phrase Sets

The second embodiment of an apparatus to carry out the present inventionis shown FIG. 4.

Since a sequence of bunsetsu-phrase sets is a special form of abunsetsu-phrase lattice, it is apparent that the above-described methodfor the general bunsetsu-phrase lattice is also applicable to thepresent case. However, the following method according to item 1.2.2 ismore efficient.

In the following description, the positions of bunsetsu-phrase sets arerepresented by 1, 2, . . . , N, the number of elements of thebunsetsu-phrase set B_(j), by NUM(j); and the elements of B_(j), byx_(j),1, x_(j),2, . . . , x_(j),NUM(j).

In FIG. 4, SC designates a buffer memory such as a RAM for storing thevalue of the degree of reliability for each bunsetsu-phrase transferredthrough an input terminal i₁ ; and BUF, a buffer memory such as a RAMfor storing bunsetsu-phrase sets transferred through a bunsetsu-phraseinput terminal i₂. When the present invention is applied to speechrecognition, for instance, each bunsetsu-phrase candidate derived from aspeech recognizer is transferred through the input terminal i₂, whilethe degree of reliability of the corresponding bunsetsu-phrase istransferred through the terminal i₁,

PE is a unit for calculating the degree of dependency PEN(x,y) betweentwo bunsetsu-phrases x and y read out from BUF.

T1 and T2 are RAMs for realizing tables TABLE1 and TABLE2 in theflowchart shown in FIGS. 5A and 5B.

ADD1 is an adder for adding a value stored in TABEL1(i,k,p) to the valueof PEN (x_(k),p,x_(j),q).

MIN1 is a minimum value detector for detecting a minimum value of theoutput from ADD1 when said p is varied, and for detecting p whichattains the minimum value.

ADD2 is another adder for adding the output from the minimum valuedetector MIN1 to the value in TABLE1(k+1,j,q).

MIN2 is a minimum value detector for detecting the minimum value of theoutput from the adder ADD2 when said k is varied, and for detecting kwhich attains the minimum value.

CONT is a control unit for controlling the whole apparatus to work inthe predetermined operation sequence and, for instance, comprises acentral processing unit CPU, a memory MEM1 in the form of a ROM forstoring the control sequence and a working memory MEM2 in the form of aRAM. The results of calculation written into RAM T1 and T2 are read outfrom output terminals O₁ and O₂, respectively.

FIGS. 5A and 5B show a flowchart illustrating an example of the controlsequence stored in the MEM1 for obtaining the pair of the optimumsegmenting point and the optimum bunsetsu-phrase used to decide theoptimum dependency structure on the optimum bunsetsu-phrase sequence,and its degree of acceptability. This flowchart will be described below.

In addition to the flowchart as shown in FIG. 5, two three-dimentionaltables TABLE1(i,j,q) and TABLE2(i,j,q) (1≦i≦j≦N,1≦q≦NUM(j)) having thesame number of columns and rows as the length N of the sequence ofbunsetsu-phrase sets under consideration and the same number of sectionsas the number of elements NUM(j) of the j-th bunsetsu-phrase set. Thesuffixes of each table represent the position of corresponding column,row and section from left to right, respectively. TABLE1(i,j,q) storesthe values of OPT(i,j,x_(j),q) while TABLE2(i,j,q) stores the pairs ofthe optimum segmenting point and the optimum bunsetsu-phrase number fori,j,x_(j),q.

Operations in the flowchart proceed as follows:

○1 In steps S1-S13 of the flowchart as shown in FIGS. 5A and 5B, thecolumn number j of each table is incremented from 1 to N and theoperation ○2 to be described below is carried out for each column.

○2 In steps S2-S11, the row number i is decremented from j to 1 and ○3below is executed.

○3 In steps S3-S9, q is incremented from 1 to NUM(j) and (1) and (2) tobe described below are executed.

(1) If j=i in step S4, then the following operation is executed in stepS7:

[F1'] The value of S(x_(j),q) is stored in TABLE1(i,j,q).

(2) if j>i in step S4, [F2'] to be described below is executed in stepS5 and [F3'] to be described below is executed in step S6.

[F2'

    min(i≦k≦j-1)[min(1≦p≦NUM(k))

    [TABLE1(i,k,p)+PEN(x.sub.k,p, x.sub.j,q)]+TABLE1(k+1,j,q)]

is calculated and the value is stored in TABLE1(i,j,q). [F3']. The pair(k,p) of k and p which attain the minimum value in [F2'] is stored inTABLE2(i,j,q).

Each column, each row and each section of TABLE1 and TABLE2 are filledwith the calculated value sequentially as described above.

When j>N in step S13, the calculation is completed and the values ofOPT(1,N,x_(N),q) (1≦q≦NUM(N)) are stored in TABLE1(1,N,q). Since theinformation concerning the optimum segmenting points and the optimumbunsetsu-phrase numbers is stored in TABLE2, the optimum bunsetsu-phrasesequence and optimum dependency structure thereon can be composedaccording to the method described in the above item 1.3.2.

2.3. Composition of the Optimum Dependency Structure

In order to actually carry out the present invention, in addition to theflowchart as shown in FIGS. 2A and 2B or in FIGS. 5A and 5B, a mechanismfor composing the optimum bunsetsu-phrase sequence and the optimumdependency structure thereon is needed, but the gist of the presentinvention is to calculate the contents in TABLE1 and TABLE2 so that thedescription of the mechanism for composing the optimum bunsetsu-phrasesequence and the optimum dependency structure thereon is limited to thescope of item 1.3.1 or 1.3.2. It should be noted, however, that when thecalculation of the contents in TABLE1 and TABLE2 is accomplished, thegreater part of the computation required for obtaining the optimumbunsetsu-phrase sequence and the optimum dependency structure thereon iscompleted.

2.4. Remarks on Non-uniqueness of the Pair of Optimum Segmenting Pointand Optimum Bunsetsu-Phrase

Sometimes, there exist more than one pairs of k and p which attain theminimum value in [F2] or [F2']. In this case, TABLE2(i,j,q) is sodesigned and constructed as to store more than one pairs of numericalvalues, and all such pairs are to be stored in TABLE2(i,j,q) in [F3] or[F3']. Even if the flowchart as shown in FIG. 2 or 5 is so modified asto implement this, the amount of computation remains almost unchanged.

2.5. Features of the Invention

As described above, in selecting the optimum bunsetsu-phrase sequencefrom the given bunsetsu-phrase sets B(i,j) (1≦i≦j≦N) corresponding tophonetic symbol position i and j satisfying the condition that thestarting position of the first bunsetsu-phrase equals 1, the endingposition of the last bunsetsu-phrase equals N and the ending position ofa bunsetsu-phrase except the last bunsetsu-phrase added with 1 equalsthe starting position of the succeeding bunsetsu-phrase, and inobtaining the optimum dependency structure thereon and the degree ofacceptability thereof, the feature of the present invention resides inthat:

the acceptability of the optimum dependency structure and theinformation required for composing the optimum dependency structure arecalculated and stored progressively from shorter intervals to longerintervals with the last bunsetsu-phrase fixed; and

in calculating the acceptability of the optimum dependency structure andin obtaining the information required for composing the optimumdependency structure with the last bunsetsu-phrase fixed as x_(i),j,qfor the interval [i, j] (1≦i, j≦N), the same kind of information alreadycalculated and stored for the interval [i, k], the same kind ofinformation already calculated and stored for the interval [k+1, j] andthe degree of dependency between the bunsetsu-phrases x_(i),k,p εB(i,k)and x_(i),j,q for every possible k and p are the only informationreferred.

The above embodiments have been described in conjunction with theprocess for obtaining a minimum value because it is assumed that asmaller value of S implies a higher degree of reliability and a smallervalue of PEN implies a higher a degree of dependency between twobunsetsu-phrases. However, when a greater value of S implies a higherdegree of reliability and a greater value of PEN implies a higher degreeof dependency, the process for obtaining a maximum value should becarried out instead of obtaining a minimum value.

3. Advantageous Effects of the Invention

The fundamental operations to be performed in the present invention area comparison operation and addition operation so that the number ofoperations carried out by the method in accordance with the presentinvention is compared with that of the enumeration method which is theprior art method. In order to evaluate the number of operations, thefollowings are assumed:

(1) In order to calculate PEN(x,y), J addition operations are required;

(2) In order to add m numerical values, m-1 addition operations arerequired;

(3) In order to find a minimum value in m numerical values, m-1comparison operations are required; and

(4) the number of elements in bunsetsu-phrase set (in the case of asequence of bunsetsu-phrase set, B_(k) : and in the case ofbunsetsu-phrase lattice, B(i,j)) are the same for all k, or for all iand j.

Then, the number of operations is determined by the following threeparameters:

M: In the case of a bunsetsu-phrase lattice, the number of elements inB(i,j), and in the case of a sequence of bunsetsu-phrase sets, thenumber of elements in B_(k).

N: In the case of a bunsetsu-phrase lattice, the total length of thephonetic symbol string while in the case of a sequence ofbunsetsu-phrase sets, the total length of the sequence. The number ofoperations required for calculating PEN(x,y) expressed in terms of thenumber of addition operations.

Under the above-mentioned assumption, the number of operations iscalculated as follows:

THE CASE OF GENERAL BUNSETSU-PHRASE LATTICE

3.1.1. The Present Invention

(1) The number of addition operations

    =M.sup.2 (J+1)(N-1)N(N+1)(N+3)/120

    +M(N-1)N(N+1)(N+2)/24

(2) The number of comparison operations

    =M.sup.2 (N-1)N(N+1)(N+2)(N+3)/120

    -M(N-1)N(N+1)(N+2)/24

3.1.2. The Enumeration Method (The Prior Art Method)

(1) The number of addition operations

    =Σ(0≦n≦N-1)

    [.sub.N-1 C.sub.n ·{knum(n+1)·(J+1)·n+n}·M.sup.n+1 ]

(2) The number of comparison operations

    =Σ(0≦n≦N-1)[.sub.N-1 C.sub.n ·knum(n+1)·M.sup.n+1)-1

where knum(n) represents the number of dependency structures on asequence of bunsetsu-phrase which has a length n and is defined inSummary of Invention.

The results of the calculations for these numbers of addition operationsand comparison operations when J=1, M=5,10 and N=5,10,15 and 20 areshown in TABLE 2.

                  TABLE 2                                                         ______________________________________                                        The numbers of operations in the case of                                      the bunsetsu-phrase lattice                                                          M                                                                             5             10                                                       N        addition  comparison                                                                              addition                                                                              comparison                               ______________________________________                                        The     5    3.0 × 10.sup.3                                                                    1.3 × 10.sup.3                                                                  1.2 × 10.sup.4                                                                  5.4 × 10.sup.3                   present                                                                              10    6.7 × 10.sup.4                                                                    3.1 × 10.sup.4                                                                  2.6 × 10.sup.5                                                                  1.3 × 10.sup.5                   invention                                                                            15    4.4 × 10.sup.5                                                                    2.1 × 10.sup.5                                                                  1.7 × 10.sup.6                                                                  8.5 × 10.sup.5                          20    1.7 × 10.sup.6                                                                    8.3 × 10.sup.5                                                                  6.8 × 10.sup.6                                                                  3.4 × 10.sup.6                   The     5    4.5 × 10.sup.5                                                                    5.8 × 10.sup.4                                                                  1.3 × 10.sup.7                                                                  1.6 × 10.sup.6                   enumer-                                                                              10    .sup. 1.4 × 10.sup.12                                                             .sup. 8.0 × 10.sup.10                                                           .sup. 1.1 × 10.sup.15                                                           .sup. 6.3 × 10.sup.13            ation  15    .sup. 4.6 × 10.sup.18                                                             .sup.  1.7 × 10.sup.17                                                          .sup. 1.1 × 10.sup.23                                                           .sup. 3.9 × 10.sup.21            method 20    .sup. 1.7 × 10.sup.25                                                             .sup. 4.6 × 10.sup.23                                                           .sup. 1.1 × 10.sup.31                                                           .sup. 2.9 × 10.sup.29            ______________________________________                                    

THE CASE OF A SEQUENCE OF BUNSETSU-PHRASE SETS

3.2.1. The Present Invention

(1) The number of addition operations

    =((J+1)M+1)MN(N-1)(N+1)/6

(2) The number of comparison operations

    =M.sup.2 N(N-1)(N+1)/6-MN(N-1)/2

3.2.2. The Enumeration Method (The Prior Art Method)

    (1) The number of addition operations

    =(knum(N)·(J+1)·(N-1)+(N-1))·M.sup.N

    (2) The number of comparison operations

    =knum(N)·M.sup.N -1

                  TABLE 3                                                         ______________________________________                                        The numbers of operations in the case of                                      a sequence of bunsetsu-phrase sets                                                   M                                                                             5             10                                                       N        addition  comparison                                                                              addition                                                                              comparison                               ______________________________________                                        The     5    1.1 × 10.sup.3                                                                    4.5 × 10.sup.2                                                                  4.2 × 10.sup.3                                                                  1.9 × 10.sup.3                   present                                                                              10    9.1 × 10.sup.3                                                                    3.9 × 10.sup.3                                                                  3.5 × 10.sup.4                                                                  1.6 × 10.sup.4                   invention                                                                            15    3.1 × 10.sup.4                                                                    1.3 × 10.sup.4                                                                  1.2 × 10.sup.5                                                                  5.5 × 10.sup.4                          20    7.3 × 10.sup.4                                                                    3.2 × 10.sup.4                                                                  2.8 × 10.sup.5                                                                  1.3 × 10.sup.5                   The     5    3.6 × 10.sup.5                                                                    4.4 × 10.sup.4                                                                  1.2 × 10.sup.7                                                                  1.4 × 10.sup.6                   enumer-                                                                              10    .sup. 8.5 × 10.sup.11                                                             .sup. 4.7 × 10.sup.10                                                           .sup. 8.8 × 10.sup.14                                                           .sup. 4.9 × 10.sup.13            ation  15    .sup. 2.3 × 10.sup.18                                                             .sup. 8.2 × 10.sup.16                                                           .sup. 7.5 × 10.sup.22                                                           .sup. 2.7 × 10.sup.21            method 20    .sup. 6.4 × 10.sup.24                                                             .sup. 1.7 × 10.sup.23                                                           .sup. 6.7 × 10.sup.30                                                           .sup. 1.8 × 10.sup.29            ______________________________________                                    

The results of calculations for these numbers of addition operations andcomparison operations when J=1, M=5 and 10, N=5,10,15 and 20 are shownin TABLE 3.

As is apparent from TABLE 2 and TABLE 3, the efficiency of the presentinvention becomes higher as the values of M and N become larger. Forinstance, in the case of J=1, M=10 an N=20, the present invention bringsthe number of operations down to about one 10²⁴ -th to 10²⁵ -th of theenumeration method.

What is claimed is:
 1. In a language processing method wherein whenphonetic symbol positions are represented by natural numbers from 1 toN, a set of bunsetsu-phrases in each element of which the startingposition of the phonetic expression and the ending position thereof aredefined at various positions within a range from 1 to N and numericalvalues each indicating a degree of reliability of each bunsetsu-phraseare given, optimum bunsetsu-phrase sequences are selected, the optimumdependency structure thereon and a degree of acceptability thereof arecalculated from all the possible bunsetsu-phrase sequences obtained byarranging the bunsetsu-phrases so as to satisfy the conditions that thestarting position of the phonetic expression of the firstbunsetsu-phrase is equal to 1 while the ending position of the phoneticexpression of the last bunsetsu-phrase is equal to N and that the valueobtained by adding 1 to the ending position of the phonetic expressionof a bunsetsu-phrase except the last bunsetsu-phrase equals the startingposition of the phonetic expression of the succeeding bunsetsu-phraseunder the criterion of optimality that the sum of the numerical valuesrepresentative of the degree of dependency between two bunsetsu-phrasesand numerical value representative of the degree of reliability of eachbunsetsu-phrase has a minimum or maximum value,said language processingmethod comprising the steps of: preparing a first and a secondtwo-dimensional triangular matrix table each having columns and rowseach equal to N; dividing each square of said first and second tablesinto sections equal in number to bunsetsu-phrases in each of which theending position of the phonetic expression is equal to the number of therow and the starting position thereof is not less than the number of thecolumn thereby three-dimensionalizing said first and second tables; whenthe q-th bunsetsu-phrase in a set of bunsetsu-phrase in which thestarting position of the phonetic expression is not less than a naturalnumber i while the ending position thereof is equal to a natural numberj, has the starting position of the phonetic expression equal to i,storing the degree of reliability of said q-th bunsetsu-phrase into theaddress designated by the i-th row, the j-th column and the q-th sectionof said first table; with respect to natural numbers k ranging from i tothe number obtained by subtracting 1 from the starting position ofphonetic expression of the q-th bunsetsu-phrase in the set ofbunsetsu-phrases each element in which has a starting position not lessthan i and an ending position equal to j, storing calculated values tothe address of said first table designated by the i-th row, the k-thcolumn and each section and to the address of said first tabledesignated by the (k+1)-th row, the j-th column and the q-th section;after said storage, calculating the sum of the value stored in theaddress of the said first table designated by the i-th row, the k-thcolumn and the p-th section, the value stored in the address of the saidfirst table designated by the (k+1)-th row, the j-th column and the q-thsection and the degree of dependency between the p-th bunsetsu-phrase inthe set of bunsetsu-phrases each element in which has the startingposition not less than i and ending position equal to k and the q-thbunsetsu-phrase in the set of bunsetsu-phrases each element in which hasthe starting position not less than i and the ending position equal toj; storing the minimum or maximum value of said sum with respect to saidk and said p to the address of said first table designated by the i-throw, the j-th column and the q-th section; storing the pair of saidoptimum segmenting point k and said optimum bunsetsu-phrase number pattaining said minimum or maximum value to the address of said secondtable designated by the i-th row, the j-th column and the q-th section;filling said first and second table sequentially with the calculatedvalues; searching for the minimum or maximum value among the valuesstored in the sections of the address designated by the first row andthe N-th column of said first table, thereby obtaining the acceptabilityof the optimum dependency structure on the optimum bunsetsu-phrasesequence and the number of the last bunsetsu-phrase in said optimumbunsetsu-phrase sequence; and obtaining all pairs of the optimumsegmenting point and the optimum bunsetsu-phrase number required forcomposing the optimum dependency structure on the optimumbunsetsu-phrase sequence in said second table.
 2. A language processingapparatus wherein when phonetic symbol positions are represented bynatural numbers from 1 to N, a set of bunsetsu-phrases in each elementof which the starting position of the phonetic expression and the endingposition thereof are defined at various positions within a range from 1to N and numerical values each indicating a degree of reliability ofeach bunsetsu-phrase are given, optimum bunsetsu-phrase sequences areselected, the optimum dependency structure thereon and a degree ofacceptability thereof are calculated from all the possiblebunsetsu-phrase sequences obtained by arranging the bunsetsu-phrases soas to satisfy the conditions that the starting position of the phoneticexpression of the first bunsetsu-phrase is equal to 1 while the endingposition of the phonetic expression of the last bunsetsu-phrase is equalto N and that the value obtained by adding 1 to the ending position ofthe phonetic expression of a bunsetsu-phrase except the lastbunsetsu-phrase equals the starting position of the phonetic expressionof the succeeding bunsetsu-phrase under the criterion of optimality thatthe sum of the numerical values representative of the degree ofdependency between two bunsetsu-phrases and numerical valuerepresentative of the degree of reliability of each bunsetsu-phrase hasa minimum or maximum value,said language processing apparatuscomprising: means for storing a first and a second two-dimensionaltriangular matrix table each having columns and rows each equal to N,each square of said first and second tables being divided into sectionsequal in number to bunsetsu-phrases in each of which the ending positionof the phonetic expression is equal to the number of the row and thestarting position thereof is not less than the number of the columnthereby three-dimensionalizing said first and second tables; means for,when the q-th bunsetsu-phrase in a set of bunsetsu-phrase in which thestarting position of the phonetic expression is not less than a naturalnumber i while the ending position thereof is equal to a natural numberj, has the starting position of the phonetic expression equal to i,storing the degree of reliability of said q-th bunsetsu-phrase into theaddress designated by the i-th row, the j-th column and the q-th sectionof said first table; means for, with respect to natural numbers kranging from i to the number obtained by subtracting 1 from the startingposition of phonetic expression of the q-th bunsetsu-phrase in the setof bunsetsu-phrases each element in which has a starting position notless than i and an ending position equal to j, storing calculated valuesto the address of said first table designated by the i-th row, the k-thcolumn and each section and to the address of said first tabledesignated by the (k+1)-th row, the j-th column and the q-th section;means for, after said storage, calculating the sum of the value storedin the address of the said first table designated by the i-th row, thek-th column and the p-th section, the value stored in the address of thesaid first table designated by the (k+1)-th row, the j-th column and theq-th section and the degree of dependency between the p-thbunsetsu-phrase in the set of bunsetsu-phrases each element in which hasthe starting position not less than i and ending position equal to k andthe q-th bunsetsu-phrase in the set of bunsetsu-phrases each element inwhich has the starting position not less than i and the ending positionequal to j; means for storing the minimum or maximum value of said sumwith respect to said k and said p to the address of said first tabledesignated by the i-th row, the j-th column and the q-th section; meansfor storing the pair of said optimum segmenting point k and said optimumbunsetsu-phrase number p attaining said minimum or maximum value to theaddress of said second table designated by the i-th row, the j-th columnand the q-th section; means for filling said first and second tablesequentially with the calculated values; means for searching for theminimum or maximum value among the values stored in the sections of theaddress designated by the first row and the N-th column of said firsttable, thereby obtaining the acceptability of the optimum dependencystructure on the optimum bunsetsu-phrase sequence and the number of thelast bunsetsu-phrase in said optimum bunsetsu-phrase sequence; and meansfor obtaining all pairs of the optimum segmenting point and the optimumbunsetsu-phrase number required for composing the optimum dependencystructure on the optimum bunsetsu-phrase sequence in said second table.